176 ◾ Bioinformatics
where Ygj is the observed read count of the gene g of interest in the sample j, K j is the size
of library of sample j (total number of aligned reads), and Ygr is the observed reads count of
the same gene g in the reference sample r of library size Kr.
Then, the value of the gene expression fold change, Mg, is trimmed by 30% followed
by taking the weighted average for the trimmed Mg using inverse of the variances of read
counts of genes (Vgi) as weight since the log-fold changes from gene with larger read counts
will have lower variance on the logarithm scale. The TMM adjustment, f j, is given as
f
V M
V
j
i
n
g
g
i
n
g
i
i
i
∑
∑
=
(5.5)
The adjustment f j is an estimate for relative RNA production of two samples. The TMM
normalization factor for the sample j with m genes is given by
N
f
Y
j
j
g
m
gj
∑
=
=
1
(5.6)
TMM does not correct the observed read counts for the gene length, and hence, it is not
suitable for comparison between the gene expressions in the same sample.
5.3.5.5 Relative Expression
For a given sample j, the relative expression (RE) scaling factors are calculated as the
median of the ratios of observed counts to the geometric mean across all samples (pseudo-
reference sample, r) [28]. The scaling factors are calculated as follows:
N
g
y
Y
j
gj
r
n
gr
n
∏
=
=
median
1
1/
(5.7)
5.3.5.6 Upper Quartile
The upper quartile (UQ) normalization factor is computed as the sample upper quartile
(75th percentile) of gene counts for the genes with no zero counts for all samples [29].
5.3.6 Differential Expression Analysis
Most differential expression programs have functions that normalize gene expression
count data as part of the analysis. The design of the study is crucial in the differential
expression analysis as we discussed in the introduction of this chapter. The conditions
that group the samples and sample replicates must be determined as metadata before the
analysis. A simple gene expression study is made up of two groups (e.g., treated and control